SIMULATION OF HPC JOB SCHEDULING AND LARGE - SCALE PARALLEL WORKLOADS Mohammad
نویسندگان
چکیده
The paper presents a simulator designed specifically for evaluating job scheduling algorithms on large-scale HPC systems. The simulator was developed based on the Performance Prediction Toolkit (PPT), which is a parallel discrete-event simulator written in Python for rapid assessment and performance prediction of large-scale scientific applications on supercomputers. The proposed job scheduler simulator incorporates PPT’s application models, and when coupled with the sufficiently detailed architecture models, can represent more realistic job runtime behaviors. Consequently, the simulator can evaluate different job scheduling and task mapping algorithms on the specific target HPC platforms more accurately.
منابع مشابه
Multiple objective scheduling of HPC workloads through dynamic prioritization
We have developed an efficient single queue scheduling system that utilizes a greedy knapsack algorithm with dynamic job priorities. Our scheduler satisfies high level objectives while maintaining high utilization of the HPC system or collection of distributed resources such as a computational GRID. We provide simulation analysis of our approach in contrast with various scheduling strategies of...
متن کاملThe Effect of Real Workloads and Synthetic Workloads on the Performance of Job Scheduling for Non-Contiguous Allocation in 2D Mesh Multicomputers
The performance of non-contiguous allocation has been traditionally carried out by means of simulations based on synthetic workloads, and also it can be significantly affected by the job scheduling strategy used for determining the order in which jobs are selected for execution. To validate the performance of the noncontiguous allocation algorithms, there has been a need to evaluate the algorit...
متن کاملBSLD Threshold Driven Parallel Job Scheduling for Energy Efficient HPC centers
Recently, power awareness in high performance computing (HPC) community has increased significantly. While CPU power reduction of HPC applications using Dynamic Voltage Frequency Scaling (DVFS) has been explored thoroughly, CPU power management for large scale parallel systems at system level has left unexplored. In this paper we propose a power-aware parallel job scheduler assuming DVFS enable...
متن کاملScalable Resource Management in Cloud Computing
The exponential growth of data and application complexity has brought new challenges in the distributed computing field. Scientific applications are growing more diverse with various workloads, including traditional MPI high performance computing (HPC) to fine-grained loosely coupled many-task computing (MTC). Traditionally, these workloads have been shown to run well on supercomputers and high...
متن کاملScheduling Gangs in a Distributed System
In this paper we study the performance of parallel job scheduling in a distributed system. A special type of scheduling called gang scheduling is considered. In gang scheduling jobs consist of a number of interacting tasks, which are scheduled to run simultaneously on distinct processors. Two gang scheduling policies are used to schedule parallel jobs for two different types of job parallelism....
متن کامل